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Abstract 

The Virtual Observatory Registry is a distributed directory of information systems and other resources relevant to astronomy. 
To make it useful, facilities to query that directory must be provided to humans and machines alike. This article reviews the 
development and status of such facilities, also considering the lessons learnt from about a decade of experience with Registry 
interfaces. After a brief outline of the history of the standards development, it describes the use of Registry interfaces in some 
popular clients as well as dedicated UIs for interrogating the Registry. It continues with a thorough discussion of the design of 
the two most recent Registry interface standards, RegTAP on the one hand and a full-text-based interface on the other hand. The 
article finally lays out some of the less obvious conventions that emerged in the interaction between providers of registry records 
and Registry users as well as remaining challenges and current developments. 

Keywords: virtual observatory, registry, standards 
2000 MSC: 68U35 


1. Introduction 


In Demleitner et al. (20141, henceforth Paper I, we described 
the design and maintenance of the Virtual Observatory (VO) 
Registry as a distributed information system. Conceptually, it 
is a collection of, by now, about 15000 registry records. To 
give the Registry’s users - astronomers, the library community, 
or even the general public - access to this collection, facili¬ 
ties have to be provided that allow focused queries against it. 
This includes common bibliographic constraints (by author, ti¬ 
tle or abstract term, year, etc), but also constraints specific to 
a registry mainly concerned with data services (e.g., supported 
protocols or query parameters, metadata of published tables). 
In the design of such facilities, several challenges have to be 
addressed; 


1. different users have very different expectations and re¬ 
quirements 

2. the underlying data collection (i.e., the set of registry rec¬ 
ords) is changing over time 

3. the underlying data structure is fairly complex, and evolves 
itself as new standards and techniques are introduced in 
the VO 

4. as many uses require only a small subset of the types of 
metadata contained, partial resource descriptions should 
be retrievable 

5. the total data set cannot efficiently be transferred to clients 
as a whole 

6. registry records are frequently authored by persons not 
entirely familiar with the data model, resulting in incon¬ 
sistent quality 


In consequence, no single user interface to the Registry can 
be sufficient. Instead, the VO community designed client inter¬ 
faces, i.e., network endpoints with rigorously defined behavior 
and semantics, designed for use by programs that then present 
the actual user interfaces to Registry data. 

We will begin this paper with a brief review of the various 
client interfaces that are or were used in the VO (section]^. In 
section]^ we proceed to describe the use some selected clients 
make of these facilities and the ways they apply and expose in¬ 
formation obtained from the registry. A major part of the paper, 
section]^ is devoted to a thorough discussion of the Registry 
Relational Model (RegTAP for short), one of the two registry 
interfaces currently being developed and deployed in response 
to the deficiencies of previous standards. In section]^ the other 
new-generation interface is described. 

While laying out some common use cases of Registry data 
in section we also point out common query patterns. Sec¬ 
tion [^concludes with some speculation about probable future 
developments. 

In the following, we refer to common Registry standard 
texts by their abbreviated names as introduced in Paper I, and 
again the capitalized word “Registry” refers to the abstract con¬ 
cept, while concrete services are written in lower case (e.g., a 
“publishing registry”). Concepts from VOResource and its ex¬ 
tensions are written in small caps. 


2. History 


Although only explicitly written down in 2011, the use cases 
collected on the IVOA wiki ( IVOA Registry WG} 20111 out¬ 
line some of the challenges faced by the designers of the first 
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client interfaces to the registry in the mid-2000s - finding ta¬ 
bles containing columns with certain physics, locating services 
implementing certain protocols, and the like. 

While on the maintenance side of the registry the ecosys¬ 
tem around OAI-PMH ( |Open Archives Initiative! |2002| ) pro¬ 
vided guidance for many technology choices, in developing the 
client interfaces much more new ground had to be broken. For 


instance, the OPACs (Online Public Access Catalogs; see Kani- 


[Zabihi et al.| ( |200^ for a treatment from about the time of RIl 
design) established in the library community, while compara¬ 
ble for the purpose of locating information resources, could not 
efficiently address the use cases, and no broadly accepted stan¬ 
dard for client, rather than user, interfaces to OPACs, lent itself 
to adoption by the VO community. 

Given that the interface to be designed was expected to be 
expressive enough for requests of the type “find all TAP ser¬ 
vices exposing a table having some word in the description and 
a column with a given UCEQ’, it was determined fairly early on 
that an interface based on simple, atomic parameters would not 
be sufficent, and Registry information crucial to certain discov¬ 
ery tasks would not be queryable through it. Client interfaces 
making explicit too much of the underlying data model would 
also unduly restrict future developments of that data model. 
Thus, at least one interface to the Registry would have to sup¬ 
port a full query language. Since the Registry data model was 
defined in XML Schema, an obvious choice for the query lan¬ 
guage was XQuery ( |Robie et~ar 2014| l, a language that essen¬ 
tially extends SQL concepts to querying XML trees. 

However, factors against the adoption of XQuery included: 


• the heavy use VOResource makes of XML namespaces, 
which tended to make queries hard to write by hand; 

• the much larger installed base of relational databases com¬ 
pared to XQuery-capable engines (compounded by the 
fact that translating XQuery to a given relational schema 
is hard); 

• the desire to open up the full registry data model to queries 
written by end users, i.e., astronomers. As it was ex¬ 
pected that many of these would familiarize themselves 
with the VO’s SQL dialect ADQL (Astronomical Data 
Query Language; |Ortiz et ah] ( |2008[ )), requiring yet an¬ 
other query language for Registry access appeared unde¬ 
sirable. 


With these considerations, it was decided to base the pri¬ 
mary Registry interface on conventional relational technology. 

While the complex queries XQuery and ADQL allow were 
needed for identified use cases, it was also acknowledged that 
“Google-like” searches - more or less loose matching of words 
in documents modelled as bags of words - was the dominant 
mode of searching for resources outside of the VO in the tar¬ 
geted user base. At least if common “comfort” features like 


'Unified Content Descriptors or UCDs in the VO denote phyiscal concepts 
like “angular distance” or “radio flux” in a simple formal language jDerriere| 

[erai:][25^ 


Stemming or phrase searches are desired, this type of search is 
hard or impossible to simulate through plain ADQL given its 
very basic set of text search capabilities. Therefore, a keyword 
search operation with significant freedom for implementors was 
also defined. 

The result of these considerations was section 2 of RIl ( |Ben-| 
et al. 2009). It defines two required search operations 


Search (with constraints in ADQL) and KeywordSearch (with 
operator-defined matching of keywords against an operator-ex¬ 
tensible minimal set of fields) as well as an optional XQuery- 
Search operation. All search operations return either identifier 
lists or sequences of full resource records in OAI-PMH style. In 
addition, two OAI-PMH-like operations were defined, GetRe- 
source to obtain a resource record from an identifier, and Get- 
Identity to discover metadata about the registry service itself. 

Several implementations of the standards are available; ser¬ 
vices are provided by STScI, ESA, and AstroGrid. 

As the RIl design significantly predates the final standard¬ 
izations of both ADQL (Ortiz et al. 20081 and the transport 
protocol for queries and results - that was eventually defined in 
the TAP standard ( |Dowler et al.[ |2OT0l l -, RIl further defined 
an ad-hoc transport based on the RPC mechanism SOAP, and 
it adopted ADQL at a time when experiments were underway 
with passing ADQL statements to client interfaces in parsed 
(XML) form. In consequence, modern TAP clients cannot use 
registry endpoints, and writing queries in the aging XML seri¬ 
alization of ADQL became at least difficult as software com¬ 
ponents translating SQL expressions into the XML forms went 
unmaintained. 


Further critique came from implementor feedback (e.g., Tay 


lor 2010[) and was collected together with the use cases (IVOA 


Registry WG||20lT|l. For instance, in practice the use of a re¬ 


stricted set of XPath to specify constraints instead of defining 
an actual relational schema lead to severe interoperability prob¬ 
lems between different registries, which were further exacer¬ 
bated by not specifying rules for case folding. The apparent 
flexibility towards registry extensions provided by the XPath- 
based column references also did not pay off as originally ex¬ 
pected since registries still needed to do internal mapping as 
registry extensions were developed. In contrast to the (optional) 
XQuery interfaces, the (mandatory) ADQL interfaces frequently 
lagged behind standards deployment. 

In this situation, the most advanced Registry clients relied 
on the optional XQuery interface or even used entirely propri¬ 
etary interfaces. 

As TAP services entered the registry in the early 2010s, 
RIl’s response format also became a liability. Registry rec¬ 
ords contain table metadata, and with TAP services exposing 
many tables, resource records of several megabytes are not ex¬ 
ceptional. This made relatively common queries like “Retrieve 
basic metadata on all TAP services” expensive in terms of trans¬ 
fer time and processing required. 

Therefore, starting in 2011, it was decided to design a new 
Registry interface, dubbed “RESTful” to contrast it from the 
RIl SOAP-based protocol. With TAP and ADQL now avail¬ 
able, a replacement of the RIl Search operation was mainly a 
matter of designing a schema and a mapping to this schema 
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from VOResource. This can be seen as creating a second se¬ 
rialization of an abstract data model implicit in VOResource’s 
XML schema files. 

The combination of a defined schema and a TAP service 
had a model in ObsCore ( |Louys et al.| |2011| l. The resulting 
new standard (“RegTAP”), discussed in section is in the last 
phases of IVOA peer review as this article is written. 

A replacement for the KeywordSearch operation is also be¬ 
ing developed. Here, the wide availability of feature-rich full- 
text engines such as Apache Lucene offers the possibility of 
enriching the bag-of-words model and allows some advanced 
operators as well. We will revisit this development in section]^ 

3. Registry Use in Clients 


of registry access and are also not exposed to visible registry 
queries. 

Similarly, the spectral analysis tool VOSpec (Osuna et al.[ 


2005)1 queries the registry for all services implementing specific 


standards (spectral and line access, in this case), but since in 
contrast to TAPHandle it has no server-side component, it does 
so directly from the user’s client, using one of two built-in reg¬ 
istry endpoints implementing the RIl Search operation. Data 
extraction from the registry records retrieved is performed with 
an XSLT stylesheet. The discovered resources are presented to 
the user in a tree view for individual selection or de-selection. 
This UI is employed both in the selection of the spectral ser¬ 
vices and in the selection of servers providing information on 
the location of spectral lines. 


Many VO clients integrate Registry access, frequently with¬ 
out advertising the actual source of the data. Depending on the 
scope of the application, different parts of Registry metadata 
are used, and different presentations of this information appear 
appropriate. In the following, we look at Registry usage in a 
number of, we believe, representative applications, concluding 
with an in-depth look at TOPCAT’s use of the registry. 
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cade [CADC Table Query (TAP) Service] 

http://wwv, cede-cede, hia-ihe.nre-cn re, gc.c^/tep 
ivo://cadc, nrc.ca/tap 

cxcharvardedu-csc [Chandra Source Catalog] 

http ://cda, harvard, edu/csetap 
ivo://cxc.harvaid,edu/csc 

ia2inafit-nasadustcat [INAF-IAPS RDB NASA dust catalogue 
TAP service] 

http ;//ia2-tap. oats, inaf it:8080/epntap 
ivo://ia2. inaf.it/hosted/iaps/epn/tap/ nasadustcat 

ia2inafit-tap [Laurino etal 2011 Catalog ofWGE photometric 
redshifts for SDSS candidate qsos and galaxies] 

http://ia2-tap.oats, inaf. it: 8080/wgetap 
ivo://ia2. tnaf.it/hosted/laurino2011/tap 

jvo-agn [Catalog Quasars and Active Galactic Nuclei] 

http;//jvo.nao.acjp/sl'ynode/do/tap/agn 

ivo://jvo/agn 

jvo-henry_draper [Henry Draper Catalogue and Extension] 

http://jvo.nao.ac.jp/skynode/do/tap/henrv_draper 
ivo://jvo/henry draper 


N> 

DO 


fO.VPOr?) r)«fp link fifimn Dnpn vuu nr CAnr Dp/tjo Nnrlfi 


Figure 1: TAPHandle uses registry information to provide input completion for 
TAP service access URLs, where the completion items are complemented by 
additional metadata. 


TAPHandl^is a TAP client operated through web browsers 
( |Michel et al.||^14) l. It uses the Registry to discover all regis¬ 
tered TAP services. With this information, it can provide input 
completion in the selector for the TAP service queried (Fig. [^l, 
thus facilitating simple discovery tasks (“I want to query the 
CADC TAP server”). As it is a TAP client, is is natural for 
TAPHandle to use RegTAP as its Registry interface. Indeed, its 
use case is one of the standard tasks identified in the collection 


of requirements for a revised Registry interface (IVOA Registry 


|WG |20II) . It uses a hard-wired RegTAP endpoint, performing 
essentially a single query per session within its server compo¬ 
nent. Thus, TAPHandle users are isolated from technical details 


^Online at http: //saada.unistra. f r/taphandle. 
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Figure 2: SPLAT’s rendering of the metadata of spectral services in the 
VO: Various metadata obtained from the registry records are made selectable 
through checkboxes. 


Another VO-enabled spectral tool, SPLAT (|Castro-Neves 


and Draper 2014|l, takes this approach somewhat further by ex¬ 


posing dataSource (from SimpleDALRegExt, allowing, for in¬ 
stance, the separation of theoretical and observational services) 
and WAVEBAND from VODataService via checkboxes in its UI 
(Fig.|^. The UI shown is built from a simple query for all ser¬ 
vices implementing SSAP SPLAT furthermore allows users to 
add, as it were, private registry records (e.g., for unpublished 
services) that are then integrated into this interface. 

A drawback of hiding actual registry queries in this way is 
that metadata quality of the resource records directly influences 
the user experience for the application itself. For instance, when 
a resource record author neglected to give correct waveband 
metadata, users knowing a certain resource serves optical spec¬ 
tra were frequently confused when the service was deselected 
after restricting queries to optical data. 

The VO client Aladin (Bonnarel et al. 2000| l supporting 
the major VO protocols could build upon a registry-like system 
called GLU that predates the definition of the VO Registry ( |Fer-| 
nique et ak] 2003|l. GLU, with automatic mirror selection and 


an Aladin-customized metadata format, to this day distributes 
Registry information to Aladin. Registry records enter GLU not 
through a client interface but rather by harvesting an OAI-PMH 
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endpoint. The operators of the GLU system at CDS perform 
additional curation, e.g., by removing invalid records or records 
for known-defective services. By removing all resource meta¬ 
data not immediately relevant to the client, Aladin can keep, 
in effect, a local cache of the entire GLU content, which is im¬ 
practical for the actual Registry content, as that would currently 
entail managing and updating several hundreds of megabytes. 
Responsiveness is further enhanced by persisting this data be¬ 
tween executions of Aladin. 

There are also clients specifically built around the Registry. 
One of the most advanced to date is VOExplorer ( |Tedds et al.| 
2008^, developed by the UK’s VO project AstroGrid in the late 


2000s as part of its VODesktop suite. In its user interface it 
guides users in the construction of constraints, mixing menu- 
based selection with free-text queries as appropriate. VOEx¬ 
plorer communicates with registries via the XQuery client in¬ 
terface. By thus retrieving only parts of the full VOResource 
record it significantly reduces network traffic compared to a RIl 
Search client. When a full VOResource record is required, it is 
cached for use in future query results. 

Though the major discovery protocols defined while VODesk¬ 
top was still being developed are supported, the application’s 
focus is clearly the integration of Registry data into a workflow, 
and it offloads visualization to specialized clients by use of the 


SAMP inter-application communication protocol (Taylor et al. 


2012). Regrettably, VODesktop’s development ceased in 2009, 


with the demise of the AstroGrid project. 

To replace the comprehensive graphical Registry UI pro¬ 
vided by VODesktop, WIRRj^was developed. It is essentially 
a browser-based query builder for RegTAP, where, much like 
in VODesktop, the user can successively add constraints on 
the search results. Notable constraint types include queries for 
resources containing columns with specific UCDs, “inverted 
queries” to obtain registry information from a service’s access 
URL - which is useful for finding contact information when 
services fail -, or query with regular expressions on IVORNs. 
Even more than VODesktop, WIRR relies on external applica¬ 
tions to use the resources found, employing SAMP messages 
for transmitting resource lists. TOPCAT is one application that 
already supports these. 

To support Registry use from within custom user programs, 
libraries have been written that encapsulate details of registry 
access. Given the widespread adoption of Astropy (| Astropy| 
Collaboration et ah] 2013|l, we note here the registry functions 


within the Astropy affiliated package PyVO (Graham et al. 


2014). Eor Registry access, it contains a single function regsearch 


that supports constraints by keywords (essentially, a full-text 
search within resource record text fields), service types (e.g., 
image or spectral service) and wavebands. The function also 
has a parameter to pass in custom SQL fragments executed 
within the VAO’s registry. Due to the limitations of RIl stan¬ 
dard client protocols, a custom, VOTable-based interface is em¬ 
ployed at the moment, with a change to RegTAP in the back-end 
planned. 


Einally, there are uses of Registry client interfaces not di¬ 
rectly connected to actual VO clients. As an example we men¬ 
tion VO Eresfj^ an RSS feed of metadata for services newly 
published or updated in the Registry; new resources are also 
announced through microblogging services. VO Eresh initially 
obtained registry information from a full registry’s OAJ-PMH 
endpoint but moved to obtaining registry information through 
RegTAP as that became available. 


3.1. Case Study: TOPCAT 

TOPCAT ( |Taylor| |2005) is a tool for analysis of astronom¬ 
ical tables. Part of its function is to provide a user-friendly 
GUI for acquiring tabular data from Virtual Observatory ser¬ 
vices, most importantly TAP (Dowler et al. |2010) and Cone 


Search (Williams et al. 2008 i, but also SIA (Tody and Plante 


|2009) and SSA (Tody et al. 2012) . To achieve this, it needs the 
Registry to locate services with the relevant capabilities and to 
allow the user to assess their suitability for the science job at 
hand. 

Prom a user point of view, TOPCAT’s registry interaction 
consists of selecting a particular type of data service, option¬ 
ally supplying some keywords to match against one or more of 
a handful of fixed resource metadata fields, and dispatching a 
search which results in presentation of basic metadata for each 
matching service. The user then peruses this list and selects one 
of the returned services for subsequent use in the application. 


Registry: jlittp:/ /dc.g-vo.org/tap 

1 ▼! O 1 RegTAP ’^1 


Keywords: SDSS (^SO 

1 1 


Match Fields: Short Name Title DSubjens DID □ Publisher □Description 


Accept Resource Lists i mcel Ooi i'. || Submit Query 


A Short Name 

rr^TTTTf.-rt-u-; - 

J/ApJ/658/a» 

Title 


Pairs of QSO in SDS5-DR4 (Myers-f, 2007) 


J/ApJS/194/45 

QSO properties from SDSS-DR7 (Shen+, 2011) 

_ 

J/MNR4S/354/L31 .SDSSDR2 CJSO and DLAproperties (Murphy+, 2004) 


J/MNRAS/392/19 

The 2dF-SDSS OSO survey fCroom+, 2009) 

* 

SDSS(QSO) 

sb^BALQSO 

tsioan Dialtal SkvSurvev Quasar Catalod (7th Data Release) 

Sloan DiQital Sky Survey Broad Absorption Line Quasars Catalog: 3rd Data 

r 

SDSSCXOQSO 

Sloan Digital Sky Survey Q,uasars Deteaed by Chandra 


Lt 



Figure 3: TOPCAT’s Registry interface: the user specifies a query using a se¬ 
lector for registry service endpoint and interface protocol (RegTAP or RIl), a 
keyword text field, and a set of checkboxes for what resource fields to match 
against. An additional constraint is the service type, whch depends on the con¬ 
text these widgets are shown in. Below the input widgets is a listing of matched 
resources from which one may be selected. “Accept Resource List” allows fill¬ 
ing the resource selector from SAMP messages. 


TOPCAT makes only a single type of registry query to sup¬ 
port this functionality, the user interface to which is illustra¬ 
ted in Pig. 1^ locate all registry resources which offer a fixed 
standard capability (e.g. TAP) and which satisfy zero or more 
additional user search constraints (e.g. “Title contains the term 
UKIDSS”), and for each one return a small fixed amount of 
metadata (ID, Title, Publisher, Access URL and a few others). 
There is other information stored in the Registry records that 
TOPCAT may require, such as vs; CatalogService records 
describing table and column metadata. However, for newer 


^online at http: //dc. g-vo. org/WIRR. 


“^Online at http://dc.g-vo.org/regrss 
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VO protocols such information is also available from the regis¬ 
tered data services themselves, and TOPCAT prefers to acquire 
it from the latter source, since it may be more reliable and is 
also available for unregistered services. 

Implementing these queries in RIl presented some difficul¬ 
ties. The KeywordSearch operation is unsuitable since keyword 
searches cannot be combined with restrictions on service type. 
The XQuerySearch operation offers suitable functionality, but 
being an optional part of the standard it is only available from 
a subset of registry services (in fact, only the AstroGrid imple¬ 
mentation), and so would have restricted the choice of registries 
with which the tool could interact. The only remaining option 
is the Search operation. Syntax and semantics of the fields to 
match in the required ADQL queries were somewhat under¬ 
documented, and there is a problem with the way case sensitiv¬ 
ity is defined, but the most serious issue is that Search always 
returns the whole, perhaps large, record for each matched re¬ 
source, the bulk of which is not needed. Patchy service imple¬ 
mentation quality also contributed to make RIl-based registry 
interaction generally slow and unreliable. 

With the introduction of RegTAP, registry interaction is much 
improved. The user interface is almost unchanged, but queries 
are more precise, thanks to more careful mapping of the RM 
data model into its relational counterpart, and much faster, since 
it is possible to restrict the query response to items of interest 
only. This latter point can lead to a reduction of two orders of 
magnitude in the required data transfer. To give an admittedly 
drastic - but in practice not uncommon - example, the response 
size for a query for all TAP services registered in May 2014 
went down to about 150kB from previously roughly 25 MB. 

Note that although for both RIl and RegTAP the client uses 
an essentially SQL-like language to select resources, what the 
user sees is a keyword-based or “Google-like” interface. Map¬ 
ping from the latter to the former can result in verbose query 
text, but this text is not difficult for client code to generate. 
Therefore, for this purpose there has been no requirement for 
an essentially keyword-like client interface to the registry. 

There is scope for richer interaction with the registry from 
TOPCAT, for instance queries on fine-grained metadata (col¬ 
umn UCDs) or more detailed display of descriptive or curation 
metadata from selected records. These options may be explored 
in the future. 


4. The Registry Relational Model 


The Registry Relational Model - briefly called RegTAP for 
mainly technical reasons - is the successor to the Search method 
in RIl. It essentially defines a relational schema and rules to 
map VOResource records into this schema. Using TAP as an 
access protocol and ADQL as the query language, this is enough 
to completely define a client interface to the registry. 

A sketch of this relational schema is given in Fig. Al¬ 
though the authors first experimented with alternative structures 
that would have been derived from VOResource algorithmi¬ 
cally ([Harrison 20111, it turned out design considerations did 


not lend themselves to formalization, as discussed in the next 
subsection. 


4.1. Design Goals and their consequences 

In the following discussion of RegTAP’s design, the model 
is derived from several partly conflicting design goals, which 
are written slanted in the following. Additionally, RegTAP 
names are marked up in s Ianf ed typewri ter, while we con¬ 
tinue to write VOResource concepts in small caps. 

While RegTAP attempts to represent all concepts of VORe¬ 
source that could plausibly be of use in locating resources, it is 
not a full relational mapping of VOResource. An overarching 
design goal was to keep the model compact. In version 1, the 
model defines 13 tables, a number that would have been signif¬ 
icantly higher for a full mapping without proportionally adding 
discovery capabilities. From VOResource and its current ex¬ 
tensions, the model primarily left out; 


From TAPRegExt ( jPemleitner et al. 20121 the descrip¬ 
tions of user defined functions; these say what extra func¬ 
tions are available in ADQL queries, how to call them, 
and what they do. Representing this would have required 
an extra table, which appeared hard to justify given that 
no major discovery scenarios were found for this meta¬ 
data (it is there for TAP client use, and the TAP clients 
get the information directly from the service’s capabili¬ 
ties endpoint). 

Also from TAPRegExt the declaration of how clients can 
upload tables into TAP services, with a similar rationale. 


Prom StandardsRegExt (Harrison et al. |2012| l the enu¬ 
merations of the input parameters defined by the standard 
itself. They do not appear valuable for discovery given 
the moderate number of existing standards. Representing 
them would, however, break the simple foreign key re¬ 
lationship between interface and capability if they were 
kept in the interface table, and an additional relatively 
complex table otherwise. 


• Also from StandardRegExt the detailed information on 
the versions of documents issued. This would have re¬ 
quired an extra table, and again, given the moderate num¬ 
ber of standards, no credible discovery scenario is appar¬ 
ent. 


• VOResource’s ability to have multiple access URLs for a 
single interface; this feature has essentially not been used 
in practice, and keeping it would have introduced another 
join in all queries for access URLs and hence the vast 
majority of current Registry queries. It is planned to drop 
the feature in future VOResource versions; resources that 
actually need multiple access URLs for a single interface 
would then have to represent each endpoint as an inter¬ 
face of its own. 


To leverage existing VOResource expertise, RegTAP tries 
to follow VOResource names. However, again compromises 
had to be made to meet some other design goals. Pirst, as the 
subject domain of VOResource partly coincides with the data 
definition language of SQL, many of the terms in VOResource 
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Figure 4: A sketch of the database schema of the relational registry, adapted from |Demleitner|j2014) . The arrows indicate foreign key relationships, the “attributes” 
enumerate the fields most likely interesting to clients or scientists writing queries. When joining through relationship (red) in the discovery of data collection 
access services (cf. section [ tT} , the green tables would be “resource-bound”, the blue ones “capability-bound”, whereas the yellow ones might be either resource- 
or capability-bound depending on query semantics. 


are reserved words in database engines. As RegTAP also tries 
to avoid requiring delimited identifier^ for usability reasons 
(e.g., difficult to understand parse errors resulting from forgot¬ 
ten quotes), conflicting names were amended with tags indicat¬ 
ing entities’ roles. In this way, VOResource’s table becomes 
res-tahle, and column table-column. 

Another important design goal was to hide foreign key rela¬ 
tionships. This is again a usability concern - having to write ex¬ 
plicit join conditions would necessitate a more intimate famil¬ 
iarity with the data model than can be expected from a possibly 
casual user. Instead, query writers should need only to identify 
columns of interest and then use NATURAL JOIN to build their 
query’s FROM clause. 

This implies more name mangling, as in VOResource many 
elements can be children of different parents, for instance type, 
NAME, DESCRIPTION. Again, disambiguation is effected using tags 
prepended with an underscore, indicating the source table, ab¬ 
breviated when names would attain excessive length. Thus, de¬ 
scription in RESOURCE becomes res_description, whereas in 
CAPABILITY it becomes cap^description. Only the two co- 
lumn-like tables (table^column and intf^param) are an ex¬ 
ception. This implies that intf.param and table.column are 
the only tables that cannot be naturally joined. 

The key used for joining is obvious for all tables directly 


^Delimited identifiers in SQL, syntactically marked by enclosing the iden¬ 
tifier name with double quotes, allow using arbitrary strings as column names 
and also suspend SQL’s case folding. 


referencing resource, as Registry semantics ensure ivoid - 
the record’s IVORN - is a suitable primary key for that ta¬ 
ble. However, RegTAP also has foreign keys into the tables 
capability, interface, and res^schema, for which VORe¬ 
source does not provide suitable primary keys, as the respective 
relationships are represented by lexical inclusion in XML. Reg¬ 
TAP instead introduces surrogate keys, the nature of which is 
implementation-defined. Hence queries should never explicitly 
use them, and since the tables are naturally joinable, they have 
no reason to do so. In general, as the declaration of primary and 
foreign keys has no impact on service behavior, RegTAP makes 
no requirements in this area but restricts itself to recommenda¬ 
tions. 

A further design goal requiring changes to VOResource names 
is that quoting must not hurt. It is not uncommon that SQL au¬ 
thors and query generators employ delimited identifiers when 
they do not need to. In these cases, mixed-case column names 
easily lead to execution errors that again may not be easy to 
understand. Therefore, all identifiers in the standard are com¬ 
pletely lowercase. Internal capitalization to indicate compound 
words is not uncommon in VOResource, however. In RegTAP 
compound words are concatenated with underscores, such that, 
for instance, relationshipType becomes relationship_type. 

If only for reasons of ease of implementation across differ¬ 
ent back-end database engines, it was important for us to not 
grossly violate the relational model. However, an analogue of 
the object-relational impedance mismatch impacts RegTAP as 
well: for VOResource, being an XML application, hierarchy 
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and sequences are natural and easy. In a relational model, these 
translate into foreign keys and extra tables and thus complicate 
the schema. In order to avoid an inflation of tables, RegTAP 
supports what in effect are arrays of simple strings. 

These are only used where values are taken from controlled 
vocabularies, specifically for level and type from content, wave¬ 
band and RIGHTS from Resource, flag from column, and query- 
Type from interface. Here, multiple values from VOResource 
are concatenated with hash characters (#). To allow reliable 
querying in these columns, RegTAP services must implement 
an ADQL user defined function called ivoJiashlist Jias. We 
specifically did not use that pattern for subject, as its vocabu¬ 
lary, while governed by a recommendation to use the IVOA 
Thesaurus, is deliberately open - indeed, it is well conceivable 
that, for instance, hashtags might at some point be used here - 
and it stands to reason that complex queries over ressubject 
will be performed when clients make use of, say, ontologies that 
may themselves be represented in database tables. 

In a model so heavily dealing with natural language, another 
violation of strict relationality is almost unavoidable: Treating 
text as, at least, bags of words. RegTAP therefore requires con¬ 
forming services to offer a user-defined function 

ivo_hasword(txt VARCHAR(*), pat VARCHAR(*)) 

-> INTEGER 

that returns true at least when pat is present in txt. Operators 
are urged to match pat to txt in an information-retrieval (IR) 
sense (i.e., “Google-style” as document vectors). This is the 
main violation of RegTAP’s design goal that different registries 
yield identical results for identical queries. This violation is 
regrettable, as experience shows that users are at least confused 
if their familiar result lists change after a change in the registry 
endpoint used by their client. However, given that IR facilities 
in back-end databases are inconsistent with each other and an 
independent implementation of them is nontrivial, the design 
goal that the standard does not exclude a major database back¬ 
end overrode the consistency concern. 

A final salient design goal is that Registry extensions are 
possible without schema updates. Registry extensions change 
the XML schema, and hence RegTAP would have to represent 
arbitrary XML trees within a fixed relational schema if this de¬ 
sign goal were to be fully achieved. The result would have been 
very hard to query indeed. RegTAP’s designers therefore iden¬ 
tified a subset of extensions that is relatively straightforward in 
queries, powerful enough to satisfy foreseeable use cases, and 
reasonably compact: atomic values in 1 :n relationships over ei¬ 
ther resource or capability. The result is RegTAP’s res^detail 
table. 

This table on the one hand references resources or capabil¬ 
ities by their ivoid and, as appropriate, the surrogate key on 
capability (which is NULL for items pertaining to the entire 
resource). On the other hand it contains keys (detail^xpath) 
and values (detail^value). The keys in this table are essen¬ 
tially XPath expressions within the resource record, much like 
the references in RIl query constraints. The values are always 
strings, even when the VOResource elements represented have 
other types. 


Thus, a data collection’s accessURL child is accessible through 
the key /accessURL, the maximum size of files returned from 
an image service (defined in SimpleDALRegExt) is retrieved as 
its decimal serialization under the key /capability/maxFileSize, 
and the authorities managed by a registry are in, if necessary 
multiple, rows with the key /managedAuthority. 

When mapping existing Registry extensions, it was found 
this was sufficient to express the concepts contained with the 
exceptions outlined above. A full fist of the keys from the reg¬ 


istry extensions published before RegTAP is given in Demleit- 


ner et al. ( 2014|l, and future registry extensions should specify 


which additional keys they define. 


4.2. Addressing Particular Issues 

In going from VOResource to a relational scfiema, proper¬ 
ties of eitfier tfie relational or tfie XML model or restrictions of 
tfie query language forced us to introduce additional rules for 
several entities. We mention some major special cases in tfiis 
subsection. 


Case issues. A particular challenge in the mapping rules from 
VOResource to RegTAP were case-insensitive values. For in¬ 
stance, IVORN (|Plante et'afi 2007|, UCDs, and utypes in cur¬ 
rent VO usage (Graham et al. 20131 all have to be compared ig¬ 


noring case. Even if ADQL had an operator for case-insensitive 
string comparison, having to consider case issues in compar¬ 
isons would invite bugs in queries that are hard to detect - when 
a query author forgets that a column must be compared ignor¬ 
ing case, the queries might still return some records and thus 
appear to work. RegTAP therefore mandates that all such val¬ 
ues must be lowercased during ingestion. In this way, queries 
not taking into account case insensitivity will at least reliably 
produce an empty result list. RegTAP further case-normalizes 
other columns filled from controlled vocabularies to be as con¬ 
sistent as possible. Only columns intended for presentation 
(essentially the descriptions and titles, role name, and subject) 
and those where case normalization might lead to ambiguities 
(mainly detail^xpath and detail^value) are exempt from 
normalization. Where case normalized comparisons are desired 
for such mixed-case columns, RegTAP offers a UDF ivo_nocase_match 
in addition to ivo_hasword (that ignores case as well). 


Order. While in most parts of VOResource, the order implied 
in XML trees is irrelevant and thus no particular attention is 
necessary in the translation to the sets of the relational model, 
CREATOR is an exception. Typically used to convey authorship 
information, order there matters to many data providers. Rather 
than add sequencing capabilities to res_role, RegTAP adds 
a column creator_seq to resource that contains a pre-for- 
matted author list. This has the additional benefit tfiat clients 
do not need worry about reconciling the (correct) practice of 
having one author per creator element with the (widespread) 
practice of including multiple names in one element in order 
to produce a flat author fist at least for display purposes; any 
necessary special fiandling happens at the registry. 
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QNames. Several VOResource values are really XML quali¬ 
fied names (QNames). This concerns some fairly fundamental 
VOResource concepts, in particular the types of resources, in¬ 
terfaces, and capabilities. For instance, a query might be inter¬ 
ested in locating all resources that are VODataService DataCol- 
LECTioNS. These are identified by having an XML schema type 
of {http;//www.ivoa.net/xml/VDDataService/vl.l)Da- 
taCollection, using the conventional notation that prepends 
the namespace part of a QName in curly brackets. As this nota¬ 
tion is cumbersome for input, serialized XML maps the names¬ 
pace URIs to namespace prefixes. VOResource and extensions 
strongly recommend the use of canonical prefixes that would, 
for instance, bind the prefix “vs” to the VODataService names¬ 
pace. Hence, the above name becomes a much more manage¬ 
able vs: DataCollection. Unfortunately, the canonical pre¬ 
fixes are not mandatory in VOResource, which means that reg¬ 
istries might use entirely different prefixes, and indeed, in reg¬ 
istry practice, several do. 

As long as the RegTAP ingestor knows which attributes 
contain QNames - and that is defined in VOResource XML 
schema files -, it can, however, unify prefixes by turning the 
namespace prefixes of the instance document into namespace 
URIs and then translating them back into the canonical prefixes. 
To ensure consistent results over registries, RegTAP requires 
this prefix normalization. Essentially, the recommendation to 
use its canonical prefix contained in all VOResource standards 
becomes a hard requirement for RegTAP. 

5. Full-text Based Registry Interface 


VO client use cases is to find services by capability type (Spec¬ 
trum, Image, TAP, etc), and specific words or expressions from 

its DESCRIPTION, CONTENT.SUBJECT, TITLE, OT SHORtNaME attributes. 

Other useful selection criteria come from curation’s publisher, 
CREATOR.NAME, and CONTRIBUTOR. The COVERAGE is also used to 
filter services in spectral, time, and soon in spatial domain. The 
spatial coverage actually is an optional field in the resource 
description and not well described in the service declaration. 
However this information will soon be available in a standard 
way, using the HEALPix Multi-Order Coverage maps (cf. |7.3| l. 

As the developers of the full-text search based interfaces are 
also data curators in the VO Registry, some specific information 
was kept and is available for selected resources: 

• the IVORN of the resource record 

• the dates of the publication and last update of the resource 
record 

• the registry where the resource has been declared and 
which is responsible for this record 

• information from validation. 

5.2. Query Examples 

A prototype service implementing the full-text query is be¬ 
ing maintained at VO Pari^ The following example queries 
can be executed there by passing them as URL query strings; 
for reasons of readability, the query strings are shown here not 
URL-encoded. 


In parallel with the relatively complex RegTAP interface, 
a successor to RJl’s KeywordSearch is also being developed. 
This full-text based registry addresses the difhculty of extract¬ 
ing information from the previous registry interface. As field 
values describing resources are mostly text, a full-text search 
engine such as the Apache Lucene library - as used in popu¬ 
lar server components like ElasticSearch or Apache Solr - is 
suitable to index and search the contents of the registry. It 
was therefore decided to develop a RESTful API using Elas¬ 
ticSearch as a client interface to the Registry. The require¬ 
ments were to fulfill the registry use cases defined by the IVOA 
Registry Working Group ( |IVOA Registry WG |2011| l and to 
support the web clients developed at VO-Paris Data Centre, 
among them several Registry curation tools. The full spec¬ 
ification of the RESTful interface is currently maintained at 
http: //api. VO. obspm.fr/registry/. That page also pro¬ 
vides several examples. 


5.1. Query Interface 

The /search method is derived from the Search and Key¬ 
wordSearch operations of the RIl searching interface. It allows 
querying common Registry items individually or all together. 
For requests specifying multiple constraints logical AND is used 
by default, but a logical OR is available by adding a parameter 
“orValues” in the query string 

The initial set of fields to filter a request has been extended 
to fulfill the requirements of all clients using it. One of the 


• Search all resources containing the keyword “infrared”: 

keywords=infrared 

• Ditto, but only return services implementing the Simple 
Image Access protocol: 

keywords=infrared 

+"ivo://ivoa.net/std/SIA" 

• Search for all resources published by the Centre de donnees 
de Strasbourg (CDS) implementing the Simple Cone Search 
protocol, with a contentLevel of Research, and return 
the 100 resources starting from match number 200: 

keywords=publisher:cds 

+standardid:"ivo://ivoa.net/std/ConeSearch" 
+contentlevel:Research 
&max=100 
&from=200 

• Return the full resource record for some IVORN: 

identifier=ivo://vopdc.obspm/luth/exoplanet 


®Access URL http://voparis-registry.obspm.fr/vo/ivoa/l/voresources/ 
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5.3. Service Response 

As the aim of the full-text query interface is to provide the 
simplest system for VO application developers, and most of the 
new clients are JavaScript based, the API returns query results 
formatted in JSON responses. Those responses give back the 
most useful fields as a subset of the whole service declaration 
plus a link to the original VOResource XML file in the registry. 
The subset of information returned is relatively compact and 
has proven sufficient for the clients already using the API. 

6. Common Registry Queries 

VOResource is a complex data model that sometimes of¬ 
fers multiple ways of expressing apparently very similar con¬ 
cepts. Driven by both registry record authoring practices and 
the queries employed by popular clients, some usage patterns 
have evolved that should be followed for successful Registry 
use. Other patterns are recommended for ease of use. We de¬ 
scribe the patterns in RegTAP terms, but most would be equally 
applicable to endpoints speaking XQuery, and partly even to 
keyword-based services. 

What is special to RegTAP is the query construction tech¬ 
nique. The way the schema is designed, one looks for the fields 
to be constrained and for the fields to be retrieved in a schema 
description such as the RegTAP specification, an implementing 
service’s TAP_SCHEMA, or its VOSI table metadata. The query 
can then be written by collecting all source tables, concatenat¬ 
ing them by NATURAL JOIN and treating the result as a single 
table. 

6.1. Locating Standard Services 

A very common type of query is finding services imple¬ 
menting a certain standard. In VOResource terms, it is actually 
not the service but one of its capabilities that complies with a 
standard. For instance, a service could at the same time im¬ 
plement a cone search for telescope pointings, and two image 
services each conforming to a specific version of SIAP. Each 
facility is then represented as a different capability. 

The capability table offers two ways to identify the kind 
of interface - one could constrain cap_type or standard-id. 
The correct constraint is on standard-id, as it would be per¬ 
fectly legal to register an SSA service, say, with a cap-type of 
vr: capability (i.e., the minimal capability description only 
consisting of a standard identifier and the interfaces). While 
the record would miss essential metadata, clients should have 
no trouble operating a service registred in this way, and hence 
the Registry query should find it. All known clients’ queries by 
service type follow the pattern of matching against the standard 
identifier. 

As the standard identifer is an IVORN, it needs to be lower¬ 
cased in queries for RegTAP. So, to locate all services (say, by 
their IVORN and titles) having SSA interfaces, the query would 
look like this: 

SELECT ivoid , res_title 
FROM rr.resource 

NATURAL JOIN rr.capability 


WHERE 

standard_id=’ivo://ivoa.net/std/ssa’ 

Other relevant standard identifiers are given in the respec¬ 
tive specifications or in one of the examples in RegTAP. 

6.2. Locating Standard Interfaces 

Locating the capability is not enough to operate a service. 
In addition, the endpoint - which in VO practice is identified by 
an access URL - needs to be located. A single capability can 
have multiple interfaces, and while this practice is not recom¬ 
mended, there are resource records that have capabilities declar¬ 
ing adherence to a standard with interfaces for web browsers in 
addition to the standard interface. 

VOResource’s interface element has a role attribute to dis¬ 
tinguish the standard interfaces from custom ones. For the for¬ 
mer, ROLE would contain a special string formed according to 
certain rules. In practice, many resource record authors have 
neglected to set role, and therefore actual clients started to ig¬ 
nore it. Current VO practice therefore is to regard the (hope¬ 
fully unique) interface of type vs:ParamHTTP as the interface 
exposing the standard. 

Hence, the pattern to locate interfaces complying to stan¬ 
dards right now is, in RegTAP (this time looking for TAP inter¬ 
faces): 

SELECT ivoid, access_url 
FROM rr.capability 

NATURAL JOIN rr.interface 
WHERE standard_id=’ivo://ivoa.net/std/tap’ 

AND intf_type=’vs:paramhttp’ 

We expect this pattern to be stable, mainly because the de¬ 
velopment of StandardsRegExt now very strongly suggests that 
services supporting multiple versions of a single standard will 
have to treat each such interface in a single capability. Hence, 
distinguishing different versions by the role attribute (or the 
intf-role column in RegTAP) appears dispensable, and as¬ 
suming the vs:ParamHTTP interface within a standard capabil¬ 
ity must be the standard service endpoint is straightforward and 
robust. 

6.3. Query by Physics 

A type of discovery query not yet widely supported in Reg¬ 
istry UIs is the query by physics. The fact that resource records 
can and in many cases do contain table metadata giving UCDs 
helps locating resources exposing a certain type of data. As 
UCDs follow a grammar that ADQL does not understand, it is 
frequently advisable to use wildcards in such queries. For in¬ 
stance, columns containing infrared magnitudes could be found 
like this: 

SELECT name, ucd, column_description 

FROM rr.table_column 

WHERE ucd LIKE ’ phot . mag ; em . ir’/, ’ 

To illustrate again RegTAP’s principle of natural joins, let 
us show how to add a constraint on the embedding table here: 
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SELECT name, ucd, column_description , 
table_description 
FROM rr.table_column 

NATURAL JOIN rr.res_table 
WHERE l=ivo_hasword( 

table_description, ’quasar’) 

AND ucd LIKE ’ phot . mag ; em . ir"/, ’ 


new service and resource types are defined within the VO, sev¬ 
eral helds of work are currently actively being explored. We 
discuss them here as they delineate what the Registry should be 
doing but does not do so far, as well as to document approaches 
tried in the VO to problems that may similarly arise in other 
communities. 


- then constraining on tables accessible via TAP is simply a 
matter of combining select list and the FROM and WHERE 
clauses from subsection 6.2 with this query. 


6.4. A Sketch of TOPCAT’s Query 

By way of example, we present a query submitted by TOP- 
CAT to acquire service metadata for presentation to the user, 
incorporating some user-supplied constraints. The following 
ADQL would locate TAP services concerning galaxies: 

SELECT ivoid , short_name, res_title , 
reference_url, base_role, role_name, 
email, intf_index, access_url, 
standard_id, cap_type, cap_description, 
std_version , res_subjects 
FROM rr . resource AS res 

NATURAL JOIN rr.interface 
NATURAL JOIN rr.capability 
NATURAL LEFT OUTER JOIN rr.res_roIe 
NATURAL LEFT OUTER JOIN ( 

SELECT 
ivoid , 

ivo_string_agg(res_subject , ’ ,u’) 

AS res_subjects 

FROM rr.res_subject GROUP BY ivoid 
) AS sbj 
WHERE 

Standard_id=’ivo://ivoa.net/std/tap’ 

AND intf_type=’vs:paramhttp’ 

AND ( 

l=ivo_hasword(res_titIe, ’galaxy’) 

OR 1=ivo_hasword(res_subjects, ’galaxy’))) 

Note how in this query outer joins are used to make sure 
rows are returned even for records that, for instance, do not 
give roles. In the case of res^subject, VOResource guaran¬ 
tees that at least one subject must always be present, so doing an 
outer join here should not be necessary. On the other hand, in 
particular in queries executed on behalf of a UI, it is good prac¬ 
tice to assume minor violations of VOResource will be present 
in the Registry. 

The sub-query for res^subjects also shows an example 
for how to reduce the number of rows transferred by server- 
side aggregation. Another application for this pattern could be, 
using suitable strings as separators, retrieving pairs of capability 
identihers and their access URLs. 


7.1. Data Collection and Relationships 

Some TAP services today expose dozens or hun dreds of ta¬ 
ble^ In ObsCore service^ (Louys et al. 20111, data from 
many individual data collections are queryable through a single 
endpoint. In the same way, some SIAP services make data from 
several individual observatories accessible. 

In all these cases, the contributing data collections should 
all be present with their full metadata in the Registry. Us¬ 
ing GAVO’s Lens Image Archiv^as an example, a title query 
for one of the contributing data collections, MiNDSTEp, say, 
should yield the full metadata for the data collectiorj^ and 
clients should, from there, be able to infer the access URL of 
the service exposing the data. 

The DataCollection type of VODataService provides a type 
for such cases, and through relationship - in this case, with a 
relationshipType of servedBy - the associate data service can 
be successfully located. 

However, client support for querying through relationship 
has been lacking, even in the most advanced registry clients. 
WIRR at least shows the presence of related resources explic¬ 
itly, but an additional query is required to retrieve them. Also, 
a query for “Image services exposing data from MiNDSTEp” 
would fail unless the registry record of the embedded service 
were carefully crafted. In RegTAP, writing ADQL for queries 
that would simultaneously hnd “direct” (i.e., services expos¬ 
ing exactly one data collection) and “indirect” (i.e., data collec¬ 
tion metadata managed separately from service metadata) ser¬ 
vices is at least highly nontrivial ( Demleitner| 2013| l. To un¬ 
derstand why joining tables through relationship requires great 
care, consider again Pig. The colors there distinguish be¬ 
tween “capability-bound” metadata that in such queries would 
have to be queried from the service (e.g., access URL, capabil¬ 
ity ids, accepted parameters) and “resource-bound” metadata 
that needs to come from the data collection itself (e.g., descrip¬ 
tion, title, or UCDs from a published table). The two tables that 
can reference both resource and capability, shown in yel¬ 
low in the hgure, additionally complicate query construction. 
In any case, natural joins of tables from different groups will 
not produce meaningful results, thus requiring query authors to 
add explicit join conditions. 

These difficulties have spurred activity to consider changing 
VOResource such that clients do not need to follow relation¬ 
ships to locate access URLs. A reasonable solution explored 


7. Open Issues 

Even after the introduction of RegTAP, Registry develop¬ 
ment is not completed. In addition to Registry extensions as 


^Examples for such services include ivo://org.gavo.dc/tap, 
ivo://nasa.heasarc/services/xainin, as well as the TAP interface 
to VizieR. 

^The CADC TAP service ivo: //cade .nrc. ca/tap belongs in this cate¬ 
gory. 

^ivo://org.gavo.dc/lensunion/q/im 
^®In this case, ivo: //org.gavo.dc/danish/red/data. 
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was adding (some of) the capabilities of the data service to the 
resource records of the data collections themselves. That, how¬ 
ever, leads to an inflation of such capabilities that will make the 
very common queries for all resources implementing a certain 
standard as laid out in subsection ih.ll much harder to handle for 
clients. For instance, to prepare for an all-VO query for images, 
clients would have Alter out duplicate access URLs in order to 
avoid querying a service exposing n resources n times (instead 
of once). 

The least burdensome solution is still to be found. Discus¬ 
sions within the Registry working group are currently investi¬ 
gating the use of “auxiliary” standard identifiers for the capa¬ 
bilities on the data collections, which would, by lexical conven¬ 
tion, facilitate the discovery of either unique services (by using 
the standard identifiers already in use) or all endpoints exposing 
data constrained by further metadata (by using an appropriate 
and index-friendly regular expression). 


7.2. Education and Internationalization 

In the context of work done within the IVOA working group 
on Education ( Molinaro et al.| 2014| l, the issue of multilingual¬ 
ity arose. While in professional astronomy, all-English meta¬ 
data seems sufficient and, indeed, preferable, the situation is not 
as clear when certain resources - in this case, educational ma¬ 
terial - should be made discoverable for educators or even the 
general public. Eor instance, if a worked-out use-case on open 
clusters is available in Italian, should it not be discoverable by 
querying for “Ammasso Aperto”? This would entail allowing 
the relevant text fields (title, description, possibly subject) to 
be present multiple times in resource records, each element con¬ 
taining text in a different language, and it would probably also 
entail allowing language constraints in client interfaces to avoid 
losing precision due to homographs in different languages. Al¬ 
ternatively, different registries might be set up for different lan¬ 
guages. 

So far, the Education WG only plans to allow discover¬ 
ing which language specific resources are available in rather 
than supporting queries in non-English languages. If, however, 
takeup of Registry technologies outside of the research com¬ 
munity were to increase, the issue would have to be revisited, 
presumably from both the client and the data model side. 


7.3. Coverage in Space and Time 

VODataService allows the specification of resource cover¬ 
age, i.e., the spatial area covered on the sky as well as the ranges 
in time and spectrum, in resource records. Apart from the con¬ 
trolled vocabulary in waveband, this is done through embed¬ 
ding STC-X ( |Rots 2005| l within registry records. No standard 
way of querying this information exists to date. An attempt 
to include coverage information through four tables giving sets 


of coordinate intervals per resource as proposed in Demleitner 


( 2012| l did not gain much traction, partly because of the flexibil¬ 
ity and complexity of the underlying STC data model, partly be¬ 
cause it was felt that for spatial coverage, coordinate ranges in 
the equatorial system were too inflexible to be generally useful 
even for discovery purposes. An obvious example illustrating 


the shortcomings would be a survey along the galactic equa¬ 
tor; either many ICRS ranges would have to be given, or the 
coverage would be dramatically overrepresented. 


In the meantime, multi-order coverage maps (Boch et al. 


|2013| MOCs) were developed as a standard way of represent¬ 
ing spatial coverages. Work is ongoing on how these could 
be integrated into VOResource on the data model side and ex¬ 
posed to the clients; if these were to be included in an extension 
to RegTAP, an ideally indexable way of representing MOCs in 
databases would be required, and no technically feasible solu¬ 
tion has been proposed so far. 


8. Conclusions 

The VO Registry is an essential source of metadata about 
the services and data that can be used within the VO, and no 
non-trivial interaction with the VO can take place without us¬ 
ing its discovery capabilities. Many VO clients embed Registry 
information and protocols in various forms. 

By necessity, standardization of the Registry protocols oc¬ 
curred relatively early in the history of the VO. While standards 
on the server side have held out very well, the early standards 
on the client side have, in the meantime, proved insufficient for 
today’s advanced Registry use. 

This resulted in the creation of second-generation client in¬ 
terfaces. In this article, we have discussed principles and design 
goals of the two currently developed interfaces. The keyword 
search interface provides a simple language to constrain results 
that accomodates users’ habits in taking up patterns from gen¬ 
eral search engines, with a response format designed for easy 
integration into browser-based applications. RegTAP, on the 
other hand, is a relatively faithful mapping of essentially the 
entire data model to a relational database schema, targeted to¬ 
wards “thick” clients and expert users writing ADQL queries 
by hand. 

While RegTAP goes much further than RIl in defining the 
mapping between the XML schema that defines the Registry 
data model and the relational model that is in practice used to 
represent the data set in queryable form, there are still some 
small areas where the relational schema and its XML counter¬ 
part do not precisely match each other’s expressiveness, pre¬ 
cluding, for instance, roundtrip ingestion and recreation of reg¬ 
istry records through a RegTAP tableset. We propose as a les¬ 
son to be learned from this that future data modelling efforts 
should be done in an implementation-neutral language with well- 
defined and well-understood mappings to the common imple¬ 
mentation languages. Within the VO, an effort is underway to 
enable this ( |Lemson et al.[[20T4| . 

This is not to say that the Registry model needs fundamen¬ 
tal work or a technology switch any time soon. The new inter¬ 
faces to the Registry expose its functionality fairly completely 
and interoperably between their implementations, and building 
on proven technologies like TAP and JSON, they also lower 
the cost of integrating VO registry information into client pro¬ 
grams. 

Some open issues remain in the registry’s client interface; 
the most urgent ones are probably the formulation of contraints 
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on spatial coverage and the handling of capabilities associated 
with data collections. 
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